This paper focuses on the uncertainty estimation of white matter lesions (WML) segmentation in magnetic resonance imaging (MRI). On one side, voxel-scale segmentation errors cause the erroneous delineation of the lesions; on the other side, lesion-scale detection errors lead to wrong lesion counts. Both of these factors are clinically relevant for the assessment of multiple sclerosis patients. This work aims to compare the ability of different voxel- and lesion- scale uncertainty measures to capture errors related to segmentation and lesion detection respectively. Our main contributions are (i) proposing new measures of lesion-scale uncertainty that do not utilise voxel-scale uncertainties; (ii) extending an error retention curves analysis framework for evaluation of lesion-scale uncertainty measures. Our results obtained on the multi-center testing set of 58 patients demonstrate that the proposed lesion-scale measures achieves the best performance among the analysed measures. All code implementations are provided at https://github.com/NataliiaMolch/MS_WML_uncs
translated by 谷歌翻译
分配转移或培训数据和部署数据之间的不匹配是在高风险工业应用中使用机器学习的重要障碍,例如自动驾驶和医学。这需要能够评估ML模型的推广以及其不确定性估计的质量。标准ML基线数据集不允许评估这些属性,因为培训,验证和测试数据通常相同分布。最近,已经出现了一系列专用基准测试,其中包括分布匹配和转移的数据。在这些基准测试中,数据集在任务的多样性以及其功能的数据模式方面脱颖而出。虽然大多数基准测试由2D图像分类任务主导,但Shifts包含表格天气预测,机器翻译和车辆运动预测任务。这使得可以评估模型的鲁棒性属性,并可以得出多种工业规模的任务以及通用或直接适用的特定任务结论。在本文中,我们扩展了偏移数据集,其中两个数据集来自具有高社会重要性的工业高风险应用程序。具体而言,我们考虑了3D磁共振脑图像中白质多发性硬化病变的分割任务以及海洋货物容器中功耗的估计。两项任务均具有无处不在的分配变化和由于错误成本而构成严格的安全要求。这些新数据集将使研究人员能够进一步探索新情况下的强大概括和不确定性估计。在这项工作中,我们提供了两个任务的数据集和基线结果的描述。
translated by 谷歌翻译
深度学习方法已成为重建MR重建的最新采样的状态。特别是对于地面真理不可行或不可能的情况,要获取完全采样的数据,重建的自我监督的机器学习方法正在越来越多地使用。但是,在验证此类方法及其普遍性的验证中的潜在问题仍然没有得到充实的态度。在本文中,我们研究了自制算法验证未采样MR图像的重要方面:对前瞻性重建的定量评估,前瞻性和回顾性重建之间的潜在差异,常用的定量衡量标准的适用性和普遍性。研究了两种基于自我监督的denoising和先验的深层图像的自我监督算法。将这些方法与使用体内和幻影数据的最小二乘拟合以及压缩感测重建进行比较。它们的推广性通过前瞻性采样的数据与培训不同的数据进行了测试。我们表明,相对于回顾性重建/地面真理,前瞻性重建可能表现出严重的失真。此外,与感知度量相比,与像素定量指标的定量指标可能无法准确捕获感知质量的差异。此外,所有方法均显示出泛化的潜力。然而,与其他变化相比,概括性的影响更大。我们进一步表明,无参考图像指标与人类对图像质量的评级很好地对应,以研究概括性。最后,我们证明了经过调整的压缩感测重建和学习的DeNoising在所有数据上都相似地执行。
translated by 谷歌翻译
休息状态功能磁共振成像(FMRI)是一种强大的成像技术,用于研究UTETO脑功能的功能发展。然而,胎儿的不可预测和过度运动具有有限的临床应用,因为它导致可以系统地改变了功能连接模式的大量信号波动。以前的研究专注于在大胎儿头部运动的情况下的运动参数的准确估计,并在每个时间点使用3D单步插值方法来恢复无动态的FMRI图像。这并不保证重建的图像对应于给定获取的数据的FMRI时间序列的最小错误表示。在这里,我们提出了一种基于胎儿FMRI散射切片的四维迭代重建的新技术。在一组真正的临床FMRI胎儿上定量评估所提出的方法的准确性。结果表明与传统的3D插值方法相比,重建质量的改进。
translated by 谷歌翻译
UTERO中显影人脑的定量评估至关重要,以完全理解神经发育。因此,正在开发自动化的多组织胎儿脑分段算法,这反过来需要训练注释数据。然而,可用的注释的胎儿脑数据集是有限的数量和异质性,妨碍稳健的细分域的域适应策略。在这种情况下,我们使用Fabian,胎儿脑磁共振采集数值模拟,模拟胎儿脑的各种现实磁共振图像以及其类标签。我们证明,这些多种合成注释数据,无成本生成并使用目标超分辨率技术进一步重建,可以成功地用于分段七种脑组织的深度学习方法的域改性。总体而言,分割的准确性显着增强,特别是在皮质灰质,白质,小脑,深灰色物质和脑干中。
translated by 谷歌翻译
飞行时间磁共振血管造影(TOF-MRA)的脑动脉瘤检测经历了剧烈的改善,深入学习(DL)。然而,监督DL模型的性能严重依赖于标记样品的数量。为了减轻Voxel-Wise标签创建的反复瓶颈,我们调查了弱标签的使用:这些是超大的注释,这些注释是更快的创造。我们为在训练期间利用弱标签的动脉瘤检测提供了深入的学习算法。此外,我们的模型通过仅关注动脉瘤发生的合理地点来利用先前的解剖知识。我们创建了284个TOF-MRA受试者(170名女性)的回顾性数据集,其中157例是患者(带198个动脉瘤),127个是对照。我们开放的TOF-MRA DataSet,社区中最大的数据集在Openneuro上发布。为了评估型号的概括性,我们参与了具有TOF-MRA数据的动脉瘤检测的挑战(93例,20例,125例,125个动脉瘤)。弱标签比其voxel-Wise对应物速度快4倍。使用先前解剖知识时,我们的网络在内部数据上实现了80%的灵敏度,每位患者的假阳性(FP)率为1.2。在公共挑战上,敏感度为68%(FP率= 2.5),排名第4/18位开放排行榜。我们发现动脉瘤破裂群(P = 0.75),位置(P = 0.72)或大小(P = 0.15)之间没有显着差异。我们的代码可用于可重复性。
translated by 谷歌翻译
In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.
translated by 谷歌翻译
Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss. However, practitioners often prefer to reformulate regression as a classification problem, observing that training on the cross entropy loss results in better performance. By focusing on two-layer ReLU networks, which can be fully characterized by measures over their feature space, we explore how the implicit bias induced by gradient-based optimization could partly explain the above phenomenon. We provide theoretical evidence that the regression formulation yields a measure whose support can differ greatly from that for classification, in the case of one-dimensional data. Our proposed optimal supports correspond directly to the features learned by the input layer of the network. The different nature of these supports sheds light on possible optimization difficulties the square loss could encounter during training, and we present empirical results illustrating this phenomenon.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
研究随机噪声的特性以优化复杂的非凸函数一直是机器学习领域的活跃研究领域。先前的工作表明,随机梯度下降的噪声通过克服景观中的不良障碍来改善优化。此外,注射人造高斯噪音已成为快速逃脱鞍点的流行想法。确实,在没有可靠的梯度信息的情况下,噪声用于探索景观,但目前尚不清楚哪种类型的噪声在探索能力方面是最佳的。为了在我们的知识上缩小这一差距,我们基于布朗尼运动的一般类型的连续时间非马克维亚过程,该过程允许该过程的相关性增加。这将基于布朗运动(例如Ornstein-Uhlenbeck过程)进行概括。我们演示了如何离散此类过程,从而导致新算法FPGD。该方法是已知算法PGD和抗PGD的概括。我们在理论上和经验上都研究了FPGD的特性,表明它具有勘探能力,在某些情况下,它比PGD和抗PGD有利。这些结果为利用噪声用于训练机器学习模型的新颖方式开辟了领域。
translated by 谷歌翻译